Integrating Conflicting Data: The Role of Source Dependence

نویسندگان

  • Xin Dong
  • Laure Berti-Équille
  • Divesh Srivastava
چکیده

Many data management applications, such as setting up Web portals, managing enterprise data, managing community data, and sharing scientific data, require integrating data from multiple sources. Each of these sources provides a set of values and different sources can often provide conflicting values. To present quality data to users, it is critical that data integration systems can resolve conflicts and discover true values. Typically, we expect a true value to be provided by more sources than any particular false one, so we can take the value provided by the majority of the sources as the truth. Unfortunately, a false value can be spread through copying and that makes truth discovery extremely tricky. In this paper, we consider how to find true values from conflicting information when there are a large number of sources, among which some may copy from others. We present a novel approach that considers dependence between data sources in truth discovery. Intuitively, if two data sources provide a large number of common values and many of these values are rarely provided by other sources (e.g., particular false values), it is very likely that one copies from the other. We apply Bayesian analysis to decide dependence between sources and design an algorithm that iteratively detects dependence and discovers truth from conflicting information. We also extend our model by considering accuracy of data sources and similarity between values. Our experiments on synthetic data as well as real-world data show that our algorithm can significantly improve accuracy of truth discovery and is scalable when there are a large number of data sources.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining data envelopment analysis and multi-objective model for the efficient facility location–allocation decision

This paper proposes an innovative procedure of finding efficient facility location–allocation (FLA) schemes, integrating data envelopment analysis (DEA) and a multi-objective programming (MOP) model methodology. FLA decisions provide a basic foundation for designing efficient supply chain network in many practical applications. The procedure proposed in this paper would be applied to the FLA pr...

متن کامل

Integrating information of the efficient and anti-efficient frontiers in DEA analysis to assess location of solar plants: A case study in Iran

The solar photovoltaic (PV) energy is one of the most promising sources of energy, which has attracted many interests. Itis potentially the largest source of energy in the world and is capable to mitigategreenhouse gas (GHG) emissions significantly in comparison with fossil fuels.Location optimization of solar plants can play a vital role to rise the efficiency and performance of the solar PV s...

متن کامل

The study of the role of loneliness and personality characteristics in dependence on virtual social networks among university students

Social networks are one of the most popular branches of virtual networks that can have a great impact on people's personal and social life. The aim of this study was to investigate the role of loneliness and personality characteristics in university students' dependence on social networks. The method of this research was of correlational type and sampling method was proportionate stratified sam...

متن کامل

Marketing Strategy Evaluation by Integrating Dynamic Systems Modeling and Network Data Envelopment Analysis

Nowadays, the service industries play an essential role in the economic development of countries, and among the various fields of insurance, life insurance is of particular importance because it covers its cover directly to humans. Increased competition in the insurance industry has led managers to seek marketing strategies that, in addition to increasing insurance sales, reduce costs and gain ...

متن کامل

The Clinical Teaching Role of Nursing Teachers

Introduction: Clinical Education is the essential part of nursing education. Because the nursing teachers' clinical role is not clearly defined, this study was designed to understand the clinical role and its process in nursing teachers. Methods: This was a qualitative research (Grounded Theory) in which 15 nursing teachers of five nursing schools participated based on theoretical sampling. Th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2009